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I.  INTRODUCTION 


In  August  1979,  Dr.  Norman  R.  Draper  of  the  Mathematics  Research 
Center,  University  of  Wisconsin,  conducted  a  three-day  course  titled  ’ 

"Regression  Theory"  sponsored  by  the  U.S.  Army  Research  Office  at  the 
Edgewood  Area  of  Aberdeen  Proving  Ground,  Maryland.  Because  of  the 
interest  created  by  this  course,  a  review  of  the  Ballistic  Research 
Laboratory’s  (BRL)  Stepwise  Multiple  Regression  package1’  was  made. 

This  review  disclosed  that  BRL's  stepwise  multiple  regression  package 

was  essentially  developed  in  1967,  subject  to  the  constraints  of  the 

1967  hardware  and  therefore  limited  in  some  of  the  statistical  tests  now 

commonly  used.  This  computer  package  is  currently  the  workhorse  in  j 

linear  regression  at  BRL.  I  j 

i  ' 

3  i  ! 

A  recommendation  by  Dr.  N.  R.  Draper  on  linear  regression  statistical 

computer  packages  was  requested  in  October  1979.  Dr.  Draper  recommended  j  i 

looking  at  a  number  of  commonly  used  statistical  packages  which  are  j 

listed  below: 

•  Biomedical  Computer  Package  (BMD,*  BMDP5)  j  < 

•  Statistical  Package  for  the  Social  Sciences  (SPSS6) 

•  International  Mathematical  and  Statistical  Library  (IMSL  )  ■ 

Because  of  the  limitations  in  the  BRL  program,  this  survey  was  under-  i  i 

taken  to  investigate  a  variety  of  regression  packages  including  the  three  ] 

recommended  by  Dr.  Draper.  j 


H.J.  Breaux,  On  Stepwise  Multiple  Linear  Regression,  Ballistic  Re¬ 
search  Laboratories  Report  No.  1369,  August  1967.  (AD  #658674) 

H.J.  Breaux,  L.W.  Campbell,  J.C,  Torrey,  Stepwise  Multiple  Regres¬ 
sion  Statistical  Theory  and  Computer  Program  Description,  BRL  Re¬ 
port  No.  1330,  July  1966.  (AD  #639955) 

3 

N.R.  Draper,  H.  Smith;  Applied  Regression  Analysis,  John  Wiley  6 
Sons,  Inc.,  1966. 

4 

W.J.  Dixon;  Biomedical  Computer  Program  (BMD) ,  University  of 
California  Press,  1973. 

W.J.  Dixon;  Biomedical  Computer  Program  (BMDP),  University  of 
California  Press,  1975. 

N.H.  Nie,  C.H.  Hull,  J.G.  Jenkins,  K.  Steinbrenner,  D.H.  Bent; 
Statistical  Package  for  the  Social  Sciences  (SPSS),  McGraw-Hill 
Inc.,  1975.  ~  —  ’ 

IMSL  Library,  Reference  Manual,  IMSL  LIB-0007,  Revised  January 
1979,  Edition  7. 
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It  should  be  understood  that  a  vast  range  of  computer  packages  are  avail¬ 
able,  but  that  only  a  small  subset  was  surveyed.  This  survey  of  linear 
regression  packages  had  two  purposes:  (i)  to  familiarize  users  with 
the  range  of  software  now  available  and  (ii)  to  promote  an  understanding 
of  techniques  used. 

In  Dr.  Draper’s  text‘d  Applied  Regression  Analysis,  selected  problems 
and  their  discussions  were  utilized  as  a  guide  in  developing  the  list  of 
recommended  regression  statistics  presented  below.  Based  upon  these 
discussions  the  comparison  table  on  several  available  computer  packages 
was  then  developed. 

II.  RECOMMENDED  STATISTICAL  FEATURES  FOR  A  LINEAR  REGRESSION  PACKAGE 

Linear  regression  is  utilized  primarily  to  investigate  relations 
between  sets  of  variables  and  some  response  variable.  These  relations 
are  sometimes  utilized  to  establish  predictions  on  a  response  variable. 

No  matter  how  linear  regression  is  used,  this  form  of  statistical  analysis 
requires  the  calculation  of  associated  statistics  and  statistical  tests  to 
evaluate  the  level  and  significance  of  the  overall  analysis.  The  following 
is  a  list  of  statistics  and  statistical  tests  which  can  be  used  to 
expound  upon  the  significance  of  the  linear  regression  analysis. 

The  first  part  (1-3)  of  the  listing  is  simply  a  statement  of  the 
problem  and  the  raw  data  used.  The  second  part  (4-8)  is  a  set  of  statistics 
to  compare  each  of  the  many  separate  regression  fits  to  one  another.  The 
last  part  (9-12)  evaluates  the  goodness  of  the  present  regression  analysis 
for  overall  interpretation.  This  recommended  list  is  not  intended  to  be 
complete,  but  rather  it  is  to  be  used  as  a  guide  to  judge  the  analysis 
and  to  aid  in  surveying  the  following  regression  packages. 
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A  LIST  OF  RECOMMENDED  REGRESSION  STATISTICS 

1.  A  list  of  each  variable  and  the  stated  regression  problem. 

Regression  Problem  -  REGRESSION  ANALYSIS 

List  of  Variables  -*  x,  *  average  monthly  temperature  (°F) 
xi  =  number  of  monthly  personnel 
X3  =  average  monthly  production  (rds) 
x^  =  number  of  operating  days  in  the  month. 

y  »  energy  usage  (dollars). 

2.  A  listing  of  the  original  and  transformed  data.  To  check  the  correctness  of  the  input  data  and  any 
data  transformation: 


OBSERVATIONS 

j  xi 

x2 

x3 

x4 

xk 

X1 

y 

1 

!  xn 

X21 

X31 

X41 

xkl 

sin  x^ 

yi 

2 

x12 

x22 

x32 

x42 

xk2 

sin 

y2 

3  1 

i 

;  j 

x13 

x23 

x33 

x42 

xk3 

1 

sin  x10 

y3 

i 

n  1 

xln 

x2n 

x3n 

x4n 

xkn 

sin  x1n 

yn 

A  list  of  Standard  Statistics 

for  each  variable.  To  examine 

the  data  being  analyzed 

mean  (p^  i*l,2,  ....  k. 

k+1 ,  k+2) 

X1 

x2 

x3  x4 

. . . 

» 

X1 

y 

standard  deviation  (0., 

1-1,2,  ... 

,  k,  k+1. 

k+2) 

S1 

s2 

s3  s4 

... 

si 

s4 

range  (maximum;  minimum) 

X1  max 

x2 

max  x3  max 

xy  max 

X1  min 

x2 

min  x3  min 

xy  min 

correlation  matrix 

rll 

r12 

r13 

rl  ,k+2 

r21 

r22 

r23 

r2,k+2 

rk+2,l 

rk+2,2  rk+2,3 

rk+2,k+2 

4.  The  Current  Regression  Equation  being  fitted: 

y  »  f  (xr  xj.  ...) 

5.  The  Last  Variable  entering  the  regression  analysis: 

(For  sequential  comparisons  with  previous  regression  models) 

last  variable  entered  *  x^  . 

6.  A  Sequential  F-Test  is  a  test  to  measure  the  significance  of  the  entering  variable  into  the  regression 
equation. 

o 

7.  Multiple  Correlation  Coefficient  (R  )  is  a  measure  of  the  variation  being  explained  by  current  regression 
model . 

Percent  variation  explained  -  42.07H 


s 


8.  The  standard  deviation  of  residuals  is  a  measure  of  the  unexplained  variation  in  the  response  variable. 

9.  Analysis  of  Variance  (ANOVA)  Table  for  regression  model  is  a  measure  of  the  regression  model  relative 
to  overall  variation. 


Source 

df 

ss 

ms^ 

Overall  F 

Total 

N-l 

SST 

- 

- 

Regression  (xj.x^) 

2  ( K) 

SSR 

SSR 

T 

F(2.N-3) 

Residual 

N-3(N-K-1) 

SSE 

ssE 

The  estimated  beta  (8^)  coefficients  and  confidence  intervals  for  each  estimated  parameter 

-%  Confidence 
Interval 

Var  No. 

S.j  Coeff 

Upper/ lower 

st.  error  Partial  F 

4 

64 

(VL4} 

st-  (62*  F4/2 

2 

®2 

(u2/l2) 

st.  (82)  F2/4 

constant  term 

60 

(a) 

The  estimated  coefficients  of  B4  :  S4 

(b) 

The  standard  error  of  the 

estimated  @4  : 

°B4 

(c) 

The  confidence  Intervals: 

84  ±  oe4  *t(n 

.a) 

(d)  Partial  F-Test:  A  measure  of  the  significance  of  the  last  variable  given  that  remaining  variables 
are  included. 

11.  Partial  correlation  of  variables  not  included  in  current  model  (Regression): 

A  measure  of  the  remaining  linear  correlation  between  the  independent  variables  and  the  response  variabl 

12.  Residual  Analysis:  to  test  the  overall  regression  fit. 

•  A  list  of  the  actual  observations  (y^,  predictions  (y^),  and  the  difference  or  residuals  (R^); 

•  A  list  of  — ^ — -I  to  test  for  normality  (N(0,o^)); 

•  The  autocorrelation  function  of  the  residuals  (R.. )  for  Independent  and  diagnos.ic  testing;  and 

•  Plot  of  the  residuals  (y..  -  y^). 


y(obser) _ predicted  y  residual  (y-.y)  N(0,1) 


TABLE  1.  A  COMPARISON  TABLE  OF  COMPUTER  PACKAGES  (REGRESSION) 


k)  Partial  corre¬ 
lation 

l )  Sequential  F-Test 

m)  Multiple  correla¬ 
tion  coeff.  (Ir) 


m 


III.  LINEAR  REGRESSION  PACKAGES 

Table  1  was  designed  with  the  specific  purpose  of  summarizing  the 
various  regression  subroutines  that  each  statistical  package  has  to  offer. 
(See  Appendix  also.)  However,  one  should  be  aware  that  in  some  of  these 
program  packages,  there  are  options  that  allow  one  to  obtain  additional 
characteristics  directly  or  indirectly.  Table  1  lists  the  various 
packages  and  their  primary  characteristics. 

In  summary,  most  statistical  packages  are  acceptable  in  terms  of 
performing  linear  regression.  In  fact,  with  the  options  plus  other  sub¬ 
routines  the  level  and  flexibility  of  the  analysis  exceeds  the  require¬ 
ments  of  most  users. 
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APPENDIX.  COMPUTER  SOFTWARE  PACKAGES 


(Stepwise  Multiple  Linear  Regression  Programs) 


a.  Biomedical  Computer  Program  (BMD/BMDP) . 

b.  International  Mathematical  §  Statistical  Libraries  (IMSL) . 

c.  Statistical  Package  for  the  Social  Science  (SPSS). 

d.  IBM  Scientific  Subroutine  Package  (SSP) . 

e.  SHARE  Libraries  (Daniel  §  Wood). 

f.  MINITAB  80. 

g.  Robust  Statistical  Estimation  Package  (ROSEPACK) . 

h.  BRL  Stepwise  Multiple  Linear  Regression. 


a.  BMP  (Biomedical  Computer  Program) 

The  Stepwise  Regression  subroutine  (BMD02R)  computes  a  sequence  of 
multiple  linear  regression  equations  in  a  stepwise  manner.  At  each 
step,  one  variable  is  added  to  the  regression  equation.  The  variable 
added  is  the  one  which  makes  the  greatest  reduction  in  the  error  sum  of 
squares.  In  addition,  variables  can  be  forced  into  the  regression  equa¬ 
tion.  Non-forced  variables  are  automatically  removed  when  their  F-values 
become  too  low.  Regression  equations  with  or  without  the  zero  intercept 
may  be  selected.  Plots  of  residuals  are  available  in  this  package. 

BMDP  (Biomedical  Computer  Program) 

Program  BMDP2R  computes  multiple  linear  regression  in  a  stepwise 
manner,  entering  the  variable  that  best  helps  to  predict  y  into  the  re¬ 
gression  equation  at  each  step.  This  continues  until  the  prediction  of 
y  does  not  improve  notably.  Whenever  the  correlation  matrix  of  the 
predictors  is  singular  or  nearly  singular  the  BMDP  programs  perform 
such  inversion  in  a  stepwise  manner.  A  predictor  variable  is  not  in¬ 
cluded  in  the  regression  equation  if  its  squared  multiple  correlation 
with  the  previously  selected  variable  exceeds  a  certain  value.  Partial 
correlation  can  be  computed  in  BMDP6R;  the  correlation  between  each  pair 
of  dependent  variables  is  then  computed  after  taking  out  the  linear 
effects  of  the  set  of  independent  variables.  Scatter  plots  of  observed 
and  predicted  (expected)  values  of  the  dependent  variable  versus  the 
independent  variable,  and  plots  of  residuals  versus  other  variable  are 
available  in  the  regression  programs  (R-series,  Regression-Series). 

IS 
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TABLE  A1 


BMP 

BMDP 

Programming  Language 

FORTRAN 

FORTRAN 

Approximate  size 

53  subroutines 

24  K 

(102  K)  K  =  1024 

26  subroutines 

24  K 

No.  of  installations  using 
package 

Many  installations 

Many  installations 

Statistical  Level 

(Developed  by  the 

Dept,  of  Biomathematics 
UCLA)  excellent 

(Developed  by  the 

Dept,  of  Biomathematics) 
excellent 

Computational  Level 
(computer) 

Health  Science  Comput¬ 
ing  Facility 

Four  Stepping  Al¬ 
gorithms  (Double 
Precision) 

Documentation 

BMD  (U  of  CA) 

User's  Manual 

BMDP  (U  of  CA) 

User's  Manual 

Date  Developed 

Jan  1973  ($8.25) 

Jan  1975 

Cost  of  package 

— 

$1,000.00  per  year. 

b.  IMSL  (International  Mathematical  and  Statistical  Libraries,  Inc.)* 

An  extensive  collection  of  mathematical  and  statistical  subroutines 
written  in  FORTRAN.  The  subroutines  in  the  regression  section  were  de¬ 
signed  to  be  useful  in  developing  versatile  application  programs  in  the 
following  general  areas:  (1)  simple  linear  regression,  (2)  multiple 
linear  regression,  (3)  stepwise  linear  regression,  and  (4)  curvilinear  re¬ 
gression.  These  27  subroutines,  integrate  with  other  mathematical  and 
statistical  routines  or  functions  allowing  for  a  range  from  the  most  sim¬ 
ple  to  the  complex  regression  analysis.  The  system  of  subroutines  make 
for  a  flexible  system  in  regression  analysis.  IMSL  is  a  system  which 
aids  the  user  in  making  his  own  programs.  At  each  step  the  critical  F 
values  in  subroutine  RLSTP ,  for  entering  and  deleting  variables,  change 
to  reflect  the  changing  error  degrees  of  freedom.  The  Jordan  method  of 


Tuo  versions:  "in-core"  version ,  is  designed  to  minimize  usage  of 
central  processing  unit  time:  " out-of-core "  version ,  is  designed  to 
mimimize  core  storage  requirements.  Each  of  the  routines  calculates 
utilizing  single  and  double  precision. 


reduction  on  the  matrix  (data)  is  performed.  The  regression  package  is 
functionally  divided  into  two  groups:  (1)  Linear  models  (RL) ,  and  (2) 
Special  nonlinear  models  (RS) .  Subroutine  RLSEP  contains  options  for; 

(1)  lack  of  fit  and  (2)  partial  F-Test  (both  the  overall  F-Test  and  partial 
F-Test  for  each  term  in  the  model  is  also  performed) .  Routine  RLSTP  is 
the  stepwise  (forward)  algorithm  with  results  available  after  each  step. 

The  library  is  available  in  seven  computer  versions. 


TABLE  A2 
IMSL 

Programming  Language 
Approximate  Size 
No.  of  Installation 
Statistical  Level 
Computational  Level 

Documentation 
Date  Developed 
Cost 


FORTRAN 

27  Subroutines 
Many  installations 
Excellent 

Subroutine  RLSEP  is  an  ex¬ 
panded  and  easy-to-use 
version  of  IMSL  routine  RLSTP 
(Double  Precision) 

User's  Manual 

1977,  revised  January  1979 

$1,220  (1  May  77)  one  year 
non-university 
universities,  $988.00 


c .  SPSS  (Statistical  Package  For  the  Social  Sciences) 

Subprogram  Regression  uses  a  forward-selection  stepwise  technique. 
Regression  also  allows  the  user  to  perform  a  regression  procedure  midway 
between  two  extremes  by  allowing  the  program  to  choose  the  order  of 
introduction  of  the  variables  from  a  certain  set,  then  force  certain 
other  variables  into  the  calculation,  then  proceed  stepwise  for  a 
period  of  time.  There  are  15  options  available  with  subprogram  Regression; 
(including  the  option  for  missing  data;  pairwise  detection  of  missing 
data;  ...;  matrix  input;  output  of  means  and  standard  deviation).  There 
are  seven  statistics  available  with  subprogram  Regression  (correlation 
matrix,  mean,  standard  deviations,  number  of  valid  cases;  forced  printing 
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of  the  correlation  matrix  and  removing  of  bad  elements;  Regression 

techniques  included  are:  (1)  curvilinear  and  nonadditive  models,  (2) 
regression  with  dummy  variables,  including  analysis  of  variance  and 
covariance  models,  and  (3)  path  analysis.  Assumption  for  nonlinear 
relationships  (data  transformation) ,  examining  polynomial  trends,  inter¬ 
action  terms  etc.  are  included  in  this  package.  The  SPSS  package  comes 
in  four  versions:  (1)  IBM  OS/370,  (2)  CDC  6000  and  CYBER  70,  (3)  UNIVAC 
1100  series,  and  (4)  XEROX  version. 


TABLE  A3 

SPSS* 

SCSS  (controversial  versions) 

Programming  Language 

FORTRAN 

FORTRAN 

Approximate  Size 

Workspace  70,000 
bytes  -  space  al¬ 
location  80,000 
bytes 

UK 

No.  of  Installations 

Many 

UK 

Statistical  Level 

Excellent  (One 
major  subroutine 
REGRESSION) 

Computational  Level 

Good 

Good  (Double  Precision) 

Date  Developed 

1970,  1975 

Fall  1979 

Cost 

* 

$2,000.00+ 

- 

Allows  for  flexibility  in  the  analysis. 

+ 

Total  Package 


l.  SSP  (IBM  System/360  Scientific  Subroutine  Package)* 

The  Scientific  Subroutine  Package  (SSP)  is  a  set  of  basic  computa¬ 
tional,  statistical  and  mathematical  FORTRAN  subroutines,  intended  to 
help  the  user  develop  his  own  packages  necessary  to  solve  problems.  The 
package  has  some  250  subroutines  and  a  number  of  these  are  in  the  area 
of  regression  analysis.  The  SSP  system  is  best  utilized  with  multiple 


C.A.  Bennet  and  N.L.  Franklin,  Statistical  Analysis  in  Chemistry  and  the 
Chemical  Industry ,  John  Wiley  &  Sons,  1954. 
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use  of  the  subroutines,  such  as  subroutines  CORRE  and  subroutines  STPRG. 
There  exist  three  inodes  of  storage  for  matrix;  general,  symmetric,  and 
diagonal.  SSP  has  15  main  programs  with  input/output,  control  (parameter) 
cards,  and  sample  data.  Three  of  these  main  programs  are  regression; 
Regression  (REGRE) ,  polynomial  regression  (POLRG)  and  Stepwise  regression 
(STEPR) .  Double-precision  versions  of  the  three  subroutines  are  avail¬ 
able.  The  Doolittle  method  is  used  in  the  stepwise  regression  subroutine. 
The  output  of  the  stepwise  multiple  regression  includes:  (1)  for  all 
data-means,  standard  deviations,  and  correlation  coefficient  matrix; 

(2)  for  each  step  in  the  multiple  regression:  sum  of  squares  reduced, 
proportion  reduced,  cumulative  sum  of  squares  reduced,  cumulative 
proportion  reduced,  multiple  correlation  coefficient  (adjusted  and  unad¬ 
justed),  F-Test  for  analysis  of  variance,  standard  error  of  estimate 
(adjusted  and  unadjusted),  regression  coefficients,  standard  errors  of 
regression  coefficients,  and  computer  t-values;  and  (3)  tables  of 
residuals. 


TABLE  A4 

SSP  (Scientific  Subroutine  Package) 


Programming  Language 
Approximate  Size 

No.  of  Installations 
Statistical  Level 
Computational  Level 
Documentation 
Date  Developed 


FORTRAN  IV 

Over  250  FORTRAN  subroutines  (sample  programs 
32K  byte  (8K  word)) 

Over  300 

Average  (standard) 

Double  Precision 

IBM  publication,  1970 

March  1970  with  updated  versions 

Standard  with  IBM  360  systems  (No  Cost) . 


Cost 


e.  Share  Lib.  (Daniel  6  Wood):* 


The  linear  least-square  program  includes  options  for  weighting, 
detection  of  outliners  and  the  standard  analysis  of  variance  table.  The 
fitted  equation  is  printed  with  variable  names,  coefficients  (B(I)), 
t-values,  minimum,  maximum  and  range  of  each  of  the  independent  variables. 
All  standard  statistics  are  listed  such  as  the  residual  root  mean 
square,  residual  mean  square,  residual  sum  of  squares,  total  sum  of 
squares  and  multiple  correlation  coefficient  squared.  Also,  residual 
values  are  listed  with  observed  and  predicted  values  plus  cumulative 
distribution  plots  of  residuals  as  standard  output.  The  Mallows'  Cp 

statistic  is  presented  as  one  method  for  comparing  the  fitted  equations. 
The  User's  Manual  is  available  in  the  text  (p.278)  written  by  C.  Daniel 
and  F.S.  Wood,  where  the  restrictions**  of  the  computer  program  are 
presented.  Some  twenty  (20)  data  transformations  are  available  as  part 
of  this  linear  regression  program.  On  page  310-311  an  example  to  measure 
the  precision  of  this  regression  program  with  other  commonly  used  least 
square  programs  is  made. 


TABLE  A5 
SHARE  LIBRARY 


Daniel 

Programming  Language 
Approximate  Size 
No.  of  Installations 
Statistical  Level 
Computational  Level 
Documentation 

Cost 


$  Wood 

FORTRAN  IV 

One  major  subroutine 

UK 

Excellent 
Very  Good 

User's  Manual  plus  textbook  (Fitting 
Equations  to  Data) 

Under  $100.00 


* 

Available  through  SHARE  Library,  Triangle  Universities  Computation  Center, 
P.0.  Box  12175,  Research  Triangle  Park,  NC  27709  (Number  360D-13. 6.008) . 

*A 

These  restrictions  may  be  altered  by  changing  the  dimension  statements 
of  the  computer  program. 
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f .  MINITAB  80  (Pennsylvania  State  University) 

• 

On-line  "Help"  facility,  flexible  transformations,  interactive 
and  batch  modes,  flexible  plotting  are  standard.  The  MINITAB  package 
is  interactive  (time  sharing)  as  well  as  batch.  The  literature  states 
that  in  a  few  hours,  without  help,  a  typical  new  user  should  be  able 
to  start  using  MINITAB.  The  MINITAB  package  contains  standard  correla¬ 
tion,  regression,  and  Analysis  of  Variance.  The  stepwise  regression 
and  general  analysis  of  variance  are  new  capabilities  expected  in  the 
near  future  (advanced  version). 


j  VS 

TABLE  A6 

V 

MINITAB  80 

v  * 

Programming  Language 

FORTRAN  IV 

[  jit 

r  i 

Y  i 

{ 

Approximate  Size 

Easy  to  install,  "no  difficulties." 

One  day.  (Large  80,000  words  over- 
layed  12,000  words  (48K)  20,000  lines 
of  FORTRAN 

X  i 

;  J 

No.  of  Installations 

Over  300  installations 

3 

Statistical  Level 

Good  (advanced  version) 

i 

Computational  Level 

20,000  lines  of  FORTRAN;  5,000  lines  of 
comments,  (double  precision) 

w 

Documentation 

Well  documented  (Student  Handbook, 

Reference  Manual  and  Implementation  Guide) 

.  t 

4 

Date  Developed 

Currently  being  developed.  PA  State  U, 

Dept,  of  Statistics 

- 

Cost 

$1000.00  per  year  (new) 

g.  ROSEPACK  (RObust  Statistics  Estimation  Package,  1. 0/2.0) 

ROSEPACK  is  a  system  of  portable  FORTRAN  subroutines  to  perform 
iteratively  reweighted  least  squares  (IRLS)  robust  linear  regression. 
ROSEPACK  contains  47  subroutines,  a  combination  of  numerical  and  sta¬ 
tistical  methods  employed  to  optimize  problems  in  the  sense  of  functions 
of  scaled  residuals.  Seven  weighting  functions  are  utilized  in  the 
reweighting  analysis.  The  robust  regression  is  aimed  at  analyzing  and 
improving  the  behavior  of  least  square  estimation  when  the  disturbances 
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are  not  well  behaved.  One  goal  of  robust  regression  is  to  avoid  undue 
influence  on  thq  fit  if  there  are  slight  changes  to  all  of  the  data  or 
large  changes  to  a  few  of  the  data  points.  The  stem  and  leaf  technique, 
gradient  method,  and  orthogonal  factorization  are  some  of  the  methods 
employed  in  the  robust  regression  package.  Work  on  ROSEPACK  was  started 
in  May  1975  at  the  Computer  Research  Center  of  the  National  Bureau  of 
Economic  Research  and  was  later  tested  at  Hampshire  College,  Bell  Labs, 
and  other  universities  (National  Science  Foundation  Grants  #DCR  75-08802, 
MCS  76-11989,  MCS  77-12514). 

The  residual  scaling  function  used  in  ROSEPACK  is  the  median  absolute 
deviation  (the  inclusion  of  other  residual  scaling  functions  is  possible). 
The  weighting  functions  are:  (a)  Huber,  (b)  Andrews  (sine),  (c)  bi  weight 
(bisquare),  (d)  Cauchy,  (e)  Welsch,  (f)  Talwar  (zero-one),  (g)  Fair, 

(d)  Logistic,  and  (i)  user  defined.  The  software,  on  tape,  for  the 
iteratively  reweighted  least  square  is  available  from  IMSL  (GNB  Building, 
7500  Bellaire  Bldv.,  Houston,  Texas  77036). 


TABLE  A7 

ROSEPACK*  1.0 

ROSEPACK  2.0 

Programming  Language 

FORTRAN 

FORTRAN 

Approximate  Size 

47  subroutines 

modular,  mathe¬ 
matical  subrou¬ 
tines  (63) 

No.  of  Installations 

Using  Program 

Unknown 

Statistical  Level 

Data/Numerical 

Analysis 

Computational  Level  (Computer) 

(Double  Precision) 

Documentation 

Limited** 

(on-line  documen¬ 
tation) 

Limited** 

on-line  documenta¬ 
tion 

17,500  lines  of  code 

Developed 

May  1975 

March  1979 

Cost 

Charge  for  each 
tape  $100.00 

IMSL  $100.00 

* ROSEPACK  1.0  and  ROSEPACK  2. 
** 

0  are  available. 

ROSEPACK  Staff  Manager ,  MIT  Center  for  Computational  Research  in 
Economic  and  Management  Science,  575  Technology  Square,  Cambridge, 
MA  02139. 
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h.  HJBSLR  (Harold  J.  Breaux;  Stepwise  Multiple  Linear  Regression). 

This  multiple  regression  is  patterened  after  M.A.  Efroymson  Gauss- 
Jordan  algorithm  (Mathematical  Methods  for  Digital  Computers,  John  Wiley 
$  Sons,  Inc.,  1960).  The  program  utilizes  a  number  of  techniques  for 
easing  the  computation.  It  reads  and  translates  a  formula  that  repre¬ 
sents  the  linear  model,  reads  the  data,  does  the  regression  analysis, 
prints  the  formula  that  contains  those  terms  that  were^finally  included 
in  the  regression  model.  It  prints  the  coefficients  ($ .) ,  prints  resid¬ 
uals  if  desired,  and  transforms  the  data.  Confidence  intervals  on  each 
regression  coefficient  (fL)  are  computed  by  the  regression  package. 


Programming  Language 
Approximate  Size 
No.  of  Installations 
Statistical  Level 
Computational  Level 
Documentation 
Date  Developed 


TABLE  A8 
HJBSLR 

FORTRAN  IV 

Subroutine  (regression) 

BRL,  APG,  MD 

Good 

Double  Precision 

BRL  Reports  #1330,  #1369 

1965  (updated  1966,  1967) 


Cost 


No  cost. 


DISTRIBUTION  LIST 


No.  of  No.  of 

Copies  Organization  Copies  Organization 


12  Commander 

Defense  Technical  Info  Center 
ATTN:  DDC-DDA 
Cameron  Station 
Alexandria,  VA  22314 

1  Commander 

US  Army  Materiel  Development 
and  Readiness  Command 
ATTN:  DRCDMD-ST 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

2  Commander 

US  Army  Armament  Research 
and  Development  Command 
ATTN:  DRDAR-TSS  (2  cys) 
Dover,  NJ  07801 

1  Commander 

US  Army  Armament  Materiel 
Readiness  Command 
ATTN:  DRSAR-LEP-L,  Tech  Lib 
Rock  Island,  IL  61299 

1  Director 

US  Army  ARRADCOM 
Benet  Weapons  Laboratory 
ATTN:  DRDAR-LCB-TL 
Watervliet,  NY  12189 

1  Commander 

US  Army  Aviation  Research 
and  Development  Command 
ATTN:  DRSAV-E 
P.0.  Box  209 
St.  Louis,  M0  61366 

1  Director 

US  Army  Air  Mobility  Research 
and  Development  Laboratory 
Ames  Research  Center 
Moffett  Field,  CA  94035 


1  Commander 

US  Army  Communications  Rsch 
and  Development  Command 
ATTN:  DRDC0-PPA-SA 
Fort  Monmouth,  NJ  07703 

1  Commander 

US  Army  Electronics  Research 
and  Development  Command 
Technical  Support  Activity 
ATTN:  DELSD-L 
Fort  Monmouth,  NJ  07703 

2  Commander 

US  Army  Missile  Command 
ATTN:  DRSMI-R 

DRSMI-YDL 

Redstone  Arsenal,  AL  35809 
1  Commander 

US  Army  Tank  Automotive  Rsch 
and  Development  Command 
ATTN:  DRDTA-UL 
Warren,  MI  48090 

1  Director 

US  Army  TRADOC  Systems 
Analysis  Activity 
ATTN:  ATAA-SL,  Tech  Lib 
White  Sands  Missile  Range 
NM  88002 

1  University  of  Wisconsin 
Department  of  Mathematics 
ATTN :  Dr .  N .  R .  Draper 
Madison,  WI  53706 

Aberdeen  Proving  Ground 
Dir,  USAMSAA 
ATTN:  DRXSY-D 

DRXSY-MP,  H.  Cohen 
Cdr,  USATEC0M 
ATTN:  DRSTE-TO-F 
Dir,  USA  CSL 
Bldg  E3516,  EA 
ATTN:  DRDAR-CLB-PA 


25 


BjMMiov  nu® 


USER  EVALUATION  OF  REPORT 


Please  take  a  few  minutes  to  answer  the  questions  below;  tear  out 
this  sheet,  fold  as  indicated,  staple  or  tape  closed,  and  place 
in  the  mail.  Your  comments  will  provide  us  with  information  for 
improving  future  reports. 

1 .  BRL  Report  Number _ 

2.  Does  this  report  satisfy  a  need?  (Comment  on  purpose,  related 
project,  or  other  area  of  interest  for  which  report  will  be  used.) 


3.  How,  specifically,  is  the  report  being  used?  (Information 
source,  design  data  or  procedure,  management  procedure,  source  of 
ideas ,  etc . ) 


4.  Has  the  information  in  this  report  led  to  any  quantitative 
savings  as  far  as  man-hours/ contract  dollars  saved,  operating  costs 
avoided,  efficiencies  achieved,  etc.?  If  so,  please  elaborate. 


5.  General  Comments  (Indicate  what  you  think  should  be  changed  to 
make  this  report  and  future  reports  of  this  type  more  responsive 
to  your  needs,  more  usable,  improve  readability,  etc.) 


6.  If  you  would  like  to  be  contacted  by  the  personnel  who  prepared 
this  report  to  raise  specific  questions  or  discuss  the  topic, 
please  fill  in  the  following  information. 


Name : 

Telephone  Number; 
Organization  Address: 


